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Stochasticity in gene expression gives rise to fluctuations in protein levels across a population of 
genetically identical cells. Such fluctuations can lead to phenotypic variation in clonal populations, 
hence there is considerable interest in quantifying noise in gene expression using stochastic models. 
However, obtaining exact analytical results for protein distributions has been an intractable task 
for all but the simplest models. Here, we invoke the partitioning property of Poisson processes to 
develop a mapping that significantly simplifies the analysis of stochastic models of gene expression. 
The mapping leads to exact protein distributions using results for mRNA distributions in models 
with promoter-based regulation. Using this approach, we derive exact analytical results for steady- 
state and time-dependent distributions for the basic 2-stage model of gene expression. Furthermore, 
we show how the mapping leads to exact protein distributions for extensions of the basic model that 
include the effects of post-transcriptional and post-translational regulation. The approach developed 
in this work is widely applicable and can contribute to a quantitative understanding of stochasticity 
in gene expression and its regulation. 
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INTRODUCTION 

One of the fundamental problems in biology is the elu- 
cidation of molecular mechanisms that give rise to pheno- 
typic variations among individuals in a population. Re- 
cent research has shown that phenotypic variations can 
arise without any underlying differences in the genotype 
or environmental factors JTl ^. Such 'non-genetic in- 
dividuality' is driven by fluctuations (noise) in cellular 
levels of gene expression products, as observed in diverse 
processes ranging from bacterial persistence f3!) to HIV-1 
viral infections (|4j). Quantifying and modeling noise in 
gene expression is thus an important step towards a fun- 
damental understanding of phenotypic variation among 
genetically identical cells. 

Noise in gene expression is generally analyzed using 
coarse-grained stochastic models ([SlISl)- For such models, 
cellular variations can be characterized using the mean 
and variance of mRNA and protein distributions (0-151). 
However, in several cases, it is of interest to character- 
ize the entire distribution, rather than just the mean 
and variance. For example, it has been demonstrated 
that protein distributions can exhibit features such as 
bimodality f|10j) that are not adequately represented us- 
ing the first two moments alone. Since protein levels in 
single cells can be measured experimentally ([Tl ]lT2l) . de- 
veloping analytical approaches for protein distributions is 
an important contribution towards building quantitative 
models of gene expression. 

Given the need for analytical results for the entire dis- 



tribution, several approaches have been developed in re- 
cent work. Analytical results for mRNA distributions 
have been derived (fISHlQl) : however, the corresponding 
results for proteins have been significantly more challeng- 
ing to obtain. When the mean mRNA lifetimes (r^) are 
much shorter than protein lifetimes (r^), analytical ex- 
pressions have been derived for protein steady-state dis- 
tributions (l20l 121]) . More generally, exact results have 
recently been derived (l22l) for the simplest model of gene 
expression, also known as the 2-stage model. While use- 
ful results have thus been obtained, further generaliza- 
tions are needed to include a broader class of models 
that include the effects of cellular regulation. 

In this paper, we develop an analytical framework that 
leads to exact protein distributions for a wide range of 
stochastic models of gene expression. In the following 
section, we provide brief definitions of some basic con- 
cepts used in the analysis. 



MASTER EQUATION AND GENERATING 
FUNCTIONS 



Defining the probability distribution <I>(A, t) to find 
the system under consideration in a given state X at a 
time i, the corresponding master equation is given by 



dMX,t) = Y, [HY,t)w], - ^X,t)w^] 



where Wv is the rate of transition from X to Y. 



(1) 



It is often the case that the state of the system (X) 
is fully characterized by a set of integers {{rij}) such as 
the number of mRNA, proteins etc. It follows that the 
probability distribution becomes $({nj},t). The corre- 
sponding generating function G (a function of a set of 
continuous variable {xj}) is defined by 
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All the moments of the probability distribution $({nj}, t) 
can be obtained from G by succesive differentiation. Fi- 
nally, the entire probability distribution can also be ob- 
tained from the expression for G, either analytically or 
by using numerical approaches. In the following, we de- 
velop an analytical framework for obtaining the gener- 
ating function G for protein distributions in stochastic 
models of gene expression. 



MAPPING TO REDUCED MODELS 

We will consider models of gene expression for which 
the creation of niRNAs is a Poisson process occurring 
with rate km- Invoking a well-known theorem on the par- 
titioning of Poisson processes (fSBp . we develop a mapping 
that significantly simplifies analysis of such models. 

We begin by partitioning the mRNA arrivals into N 
'types' (Fig. flK). Given a mRNA arrival at any time t, 
the probability that it is assigned to type i {i = 1 . . . N) 
is Qi = 1/N. Thus each mRNA is equally likely to be 
assigned to one of the N types upon arrival. Denoting 
by A/i (t) the number of arrivals of the i"^ type of mRNA 
by time t, it follows from the theorem of partitioning 
of Poisson processes ([^?|) . that the arrival of each type 
of mRNA is an independent Poisson process occurring 
with rate km/N (Fig. lA). In other words, the Ni{t) 
(i = \ . . . N) are independent Poisson random variables 
with mean {Mi{t)) = k^t/N. 

The next step consists of taking the limit N -^ oo and 
leads to the definition of the reduced model. For any 
given time t, in the limit N — > oo, the probability of ar- 
rival of more than one mRNA of any given type can be 
neglected. It follows that the random variable describ- 
ing the number of mRNAs of a given type is constrained 
to the value or 1. Effectively, after partitioning of the 
Poisson arrival process, the mRNA dynamics can be re- 
placed by the dynamics of a 2-state system. Thus, at any 
time t, we have a mapping from the original system to 
N identical subsystems. In the limit TV — >■ oo, each of 
these subsystems corresponds to what will be referred to 
as a 'reduced' model. Further details on the connection 
between original and reduced models is provided in Ap- 
pendix A. In the following, we will refer to this approach 
as the PPA (Partitioning of Poisson Arrivals) mapping. 

As an illustration, let us consider the number of niR- 
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FIG. 1. (A) A Poisson arrival process with arrival rate km can 
be partitioned to A'^ independent and identical Poisson arrival 
processes, each occurring with rate km/N. (B) Partitioning of 
the Poisson arrival process leads to a mapping from a simple 
model of creation and decay of mRNAs to N independent, 
identical 2-state systems (in the limit N — )■ cxd). The proba- 
bility of having m mRNAs in the original model is equivalent 
to the probability of having m two-state systems in the ON 
state in the reduced model (C) The same mapping applied to 
the 2-stage model of gene expression for proteins. Note that 
the reduced model is identical to a model for creation and 
decay of mRNAs with promoter-based regulation. 



NAs for the simple model shown in Fig. [Tj3. It is readily 
derived (e.g. using the Master equation) that the cor- 
responding steady-state distribution is a Poisson distri- 
bution with mean km/fJ-m- This result can also be ob- 
tained using the PPA mapping, as illustrated in Fig. [lj3. 
The dynamics of the reduced model (a 2-state model) 
is defined by the transitions between mRNA(OFF) 
and 1 mRNA(ON) states driven by the rates km/N 
and /im. Therefore, the steady-state generating func- 
tion for mRNAs in t he reduced model is given by 



9{z) = (1 



k,„/N 



f) + 



k„JN 



Correspondingly, 



the generating function for the distribution of mRNAs in 
the original model is given by G{z) = lim7v-i.oo[.9(-^)]^- 
This expression reduces to the generating function of the 
Poisson distribution with mean km/fJ-m, thereby recov- 
ering the well-known result. An explicit derivation il- 
lustrating this approaching using the Master equation is 
provided in Appendix B. 

The preceding argument can be generalized to analyze 
the distribution of proteins in stochastic models of gene 
expression. In order to apply the PPA mapping, we will 
consider models for which the protein production from 
each mRNA proceeds independently. Let P{t) be the 
random variable corresponding to the number of proteins 
in the system at time t. Partitioning the mRNAs into N 
'types', we denote by pi the random variable correspond- 
ing to the number of proteins created by the i*^ type of 
mRNA. Note that, in the limit N — >■ oo, Pi is the random 
variable corresponding to the distribution of proteins in 
the reduced model. Since each mRNA contributes in- 



dependently, the Pi{t) are independent, identically dis- 
tributed random variables such that P = J2i=iPi- Cor- 
respondingly, the generating functions for proteins in the 
original {G{z,t)) and reduced {g{z,t)) models are related 
by 



generating function for the 2-stage model is given by 



G{z,t)= lim [giz,t)] 



N 



N 



(3) 



Furthermore, it can be shown (Appendix A) that 
{g{z,t) — 1) ex krat/N leading to 



G(z, t) = lim 



exp[N{g{z,t)~l)]. 



(4) 



The significance of the above mapping lies in the fact that 
it exactly maps the original problem (obtaining G{z,t)) 
to a reduced problem (obtaining g(z,t)) which is easier 
to analyze. The simplification provided by this mapping 
derives from the fact that the number of mRNAs, which 
is unbounded in the original model, is effectively replaced 
by a 2-state system in the reduced model. 

Using Eq|4j we can readily connect expressions for the 
mean and Fano factor of the original model to the cor- 
responding expressions for the reduced model (Appendix 
A). In particular, we show that the Fano factors for the 
original and reduced models are identical (in the limit 
A^ — >■ oo). This is a useful result since it is generally 
easier to obtain the Fano factor for the reduced model. 



EXACT DISTRIBUTIONS FOR THE 2-STAGE 
MODEL 

We now show how the PPA mapping directly leads 
to exact results for protein distributions in the 2-stage 
model (Fig. fit). The 2-stage model is the simplest model 
of stochastic gene expression and has been widely ana- 
lyzed in both theoretical and experimental studies. While 
exact results for steady-state distributions have been de- 
rived recently (122|. the corresponding results for time- 
dependent distributions have not been obtained so far. 

Using the PPA mapping (Fig. [l]C), we see that the re- 
duced model (obtained by replacing each type of mRNA 
by a 2-state system) for proteins is equivalent to a model 
for mRNAs with promoter switching. An explicit deriva- 
tion of the reduced model, starting from the Master equa- 
tion, is provided in Appendix C. The reduced model has 
been studied in previous work and analytical results for 
the corresponding niRNA distributions have been ob- 
tained (fT3| fT4l). Using these results, the generating func- 
tion for the steady-state distribution of proteins in the 
reduced model is given by 

9 [z) = iFi ; ; -^(z - 1) . (5) 

\ Mp Mp Mp / 

Now, using Eq|4J we obtain that the protein steady-state 
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Equation 6, derived directly from known results, is equiv- 
alent to the exact result derived recently using a differ- 
ent approach (Appendix C). The concise derivation pre- 
sented above highlights a general point: the PPA map- 
ping approach leads to protein distributions using results 
for mRNA distributions for models with promoter-based 
regulation. 

We now apply the PPA mapping to obtain the time- 
dependent joint distribution of mRNAs and proteins in 
the original model (with generating function G{y,z,t)) 
using the time-dependent distribution of proteins in the 
reduced model (with generating function g(z,t)). As 
noted, the reduced model is equivalent to a model for 
mRNAs with promoter-based regulation and the corre- 
sponding result for the time-dependent generating func- 
tion of the mRNA distribution has been derived in pre- 
vious work (15). Using this result to obtain g{z,t), we 
derive (Appendix C) that the time-dependent joint gen- 
erating function of mRNAs and proteins is given by 



G{y,z,t) — lim exp < A^ 

A^— >-oo 



g{z,t) + {y-l)'^d,g{z,t) 



y- 



kp{z - 1) 



dtg{z,t)- 1 



(7) 



Eq[7]is the most general exact result for the 2-stage model 
of gene expression and all the previously derived results 
can be obtained from it by taking appropriate limits. 



EXACT RESULTS FOR EXTENSIONS OF 
2-STAGE MODEL 

A Model with multi-step mRNA processing 

We now show how the partitioning of Poisson processes 
leads to exact results for some biologically motivated ex- 
tensions of the 2-stage model. Fig[2]presents an extension 
that allows for an arbitrary number of processing steps 
for mRNAs. For example, in eukaryotes, these process- 
ing steps can represent reactions such as polyadenylation 
and transport to the cytoplasm which are required for 
production of a processed mRNA that is competent for 
translation. We will call such a processed mRNA a ma- 
ture mRNA (whereas the unprocessed initial transcript 
will simply be referred to as a mRNA). Let us now con- 
sider the arrival process of a mature mRNA. 

The kinetic scheme for the model with r pre-processing 
steps leading to mature mRNAs is shown in Fig. [2|\. In 
the following, we invoke the partitioning property of Pois- 
son processes to show that the arrival process of a mature 
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FIG. 2. (A) In this model mRNAs undergo multi-step pre- 
processing before being competent to produce proteins. Pro- 
teins can be created only from the mature mRNA created in 
the final processing step. (B) Arrival of mature mRNAs is 
shown to be a Poisson process in steady-state leading to the 
reduced model shown. 



mRNA, in the steady-state limit, is a Poisson process. At 
any time i, we partition the transcribed mRNAs into 2 
types: Type 1 corresponds to a transcribed mRNA that 
is converted to a mature mRNA by time t and Type 2 
includes all the remaining transcribed mRNAs. Let us 
denote the probability that a transcribed mRNA is clas- 
sified as Type 1 at time t by q{t). Thus q = limt_j.oo <l{t) 
is the probability that an mRNA transcribed at i = 
is eventually converted into a mature mRNA. Given an 
mRNA in the i"" state (1 < i < r - 1), the probabil- 
ity that it is converted into the (i + 1)"^ intermediate 

state without being degraded is ( k'+u- ) • Thus, in the 
long-time limit, we have 
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FIG. 3. (A) Kinetic scheme for model with a fixed-time delay 
in the degradation of proteins. Protein molecules after being 
tagged (with rate 7) are degraded after a fixed time delay 
T. (B) Mapping of the original model (A) to A'^ independent, 
identical reduced models (TV — >■ 00) 



2-state model (as in Fig. life). Thus the steady-state dis- 
tribution of mature mRNAs is a Poisson distribution with 
mean fceg/A*m- Furthermore, the model for proteins is the 
same as the basic 2-stage model (Fig. [It), but with km 
replaced by fcgg (Fig. [2K). Correspondingly, the exact 
protein steady-state distribution is given by Eq. [7l with 
the substitution km -^ k^q. Thus, we obtain that the 
exact steady-state distribution of proteins for the model 
in Fig. l2]is given by 
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(10) 
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Note that the arrival process of transcribed mRNAs 
(Type 1 or Type 2) is a Poisson process with rate k^,- In 
the steady-state limit, the probability that a transcribed 
mRNA is labeled as Type 1 is q. Thus, invoking the par- 
titioning theorem for Poisson processes, we obtain that 
the arrival process for a Type 1 mRNA (in the steady- 
state limit) is a Poisson process occuring with rate 
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Since an mRNA is classified at Type 1 once it becomes 
a mature mRNA, it follows that the arrival process of 
mature mRNAs, in the steady-state limit, is a Poisson 
process with rate keq. Some interesting results follow 
from the preceding observation. First, in the steady- 
state limit, since mature mRNAs arrive according to a 
Poisson process, the corresponding reduced model is a 



B Model wth delayed degradation 

The PPA mapping approach can also be applied to 
models that include non-Markovian processes. An ex- 
ample involving post-translational regulation leading to 
a constant delay in the degradation of proteins is illus- 
trated in Fig. |3] The degradation of proteins typically 
occurs via complex proteolytic pathways involving mul- 
tiple steps of tagging and binding of auxiliary proteins. 
A simplified assumption that is commonly used is to re- 
place multi-step degradation by a fixed time delay, which 
motivates the model outlined in Fig. [3] Recent work 
has analyzed protein steady-state distributions for mod- 
els with a constant time delay in protein degradation (|241 - 
[5S| . However the processes of transcription and transla- 
tion are generally lumped together and it is assumed that 
proteins are produced in a single step from the DNA in 
these models. The PPA mapping approach allows us to 
obtain the exact steady-state protein distributions for a 
simplified model which includes both mRNAs and pro- 
teins. A detailed derivation (Appendix D) leads to the 



generating function for arbitrary values of r. For sim- 
plicity, we present here the results in the limit r <C 1 
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in this work to a broader range of cellular processes for 
which stochastic effects are critical. 
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Several recent experiments have focused on quantifying 
variations in gene expression and on inference of the un- 
derlying mechanisms based on observations of noise ([?7)) . 
Correspondingly there is a clear need for theoretical tools 
to complement such experimental efforts to understand 
the role of noise in gene expression in diverse cellular 
processes. The current work addresses this need by de- 
veloping an analytical framework for obtaining protein 
distributions for stochastic models of gene expression. 

We have shown how the partitioning of Poisson ar- 
rival processes can lead to equivalent reduced models that 
are, in general, simpler to analyze. This mapping can be 
used to derive exact results for protein distributions us- 
ing mRNA distributions for models with promoter-based 
regulation. In recent work, analytical results have been 
derived for mRNA distributions for a general class of 
models with promoter-based regulation P^ [T7|) . These 
results, in combination with the PPA mapping approach 
developed in this work, can be used to obtain exact pro- 
tein distributions for a broad class of gene expression 
models. Furthermore, previous work (|28p has shown 
how a representation using generating functions can be 
used in developing a variational approach for modeling 
stochastic cellular processes. Thus the results obtained 
in this work, in combination with such variational ap- 
proaches, can be used to provide quantitative insights 
into the role of different kinetic schemes in regulating 
the noise in gene expression. 

Noise in gene expression has been shown to play a crit- 
ical role in diverse cellular processes (P). It is increas- 
ingly becoming clear that quantifying and modeling gene 
expression variations among single cells in a population 
can lead to fundamental new insights into old problems. 
The approach developed in this work can be used to ob- 
tain analytical results for multiple extensions of the basic 
gene expression models. It can be generalized to analyze 
models including promoter-based regulation, in particu- 
lar the so-called standard model of gene expression ([29| . 
As more cellular processes are studied using single-cell 
approaches, the results obtained can guide analysis and 
interpretation of such experiments. As currently formu- 
lated, the approach cannot be used for models with feed- 
back effects (i.e with rates that depend on protein num- 
bers), however it is hoped that future work will address 
this issue building on current insights. It will also be of 
interest to extend the PPA mapping approach developed 



APPENDIX 

A. CONNECTING ORIGINAL AND REDUCED 
MODELS 

In this section we discuss the relations between the 
generating functions of the original and reduced models. 
To begin, we note that the number of mRNAs (M) and 
proteins (P) in the original process are respectively given 
by the sum of the number of mRNA (tti) and protein (p) 
in the N independent and identical reduced processes. 
We define ^M{P,t) {4>m{p,t)) as the joint probability to 
find M (m) mRNA and P (p) proteins in the original 
(reduced) process at time t. The probability distributions 
of the original and reduced processes are related via 



$A/(Pt) = 



(12) 
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where 5{X — Y) ^ 1 iox X = Y and zero otherwise. It 
follows that the generating functions, defined by 



G{y^z,t) = Y,y^'zP>^M{P,t) 
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are related by 





G{y,z,t) = [g{y,z,t)] 



N 



(15) 



as expected for sums of independent and identically dis- 
tributed random variables. For large N values, successive 
differentiation shows that the averages in both models are 
related via 
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(17) 



Correspondingly the Fano factors for the protein distri- 
butions are related by: Fg = Fq — P/N, so that in the 
limit N ^ oo Fg = Fq, as presented in the main text. 

Focussing our attention on the protein distributions, 
we choose to write G{z,t) = G{l,z,t) and g{z,t) — 
5(1, z, t). In the following, we consider the limit iV — > 00. 



In this case, upto any time t, the production of more than 
one mRNA by the reduced process is highly unhkely (of 
second order in kmt/N) as shown in the main text. In 
the reduced model, one can therefore neglect all states 
with more than one mRNA. Thus we have 

9{y,z,t) = go{z,t)+ygi{z,t) (18) 

with gm{z, t) = ^ zP(f)jnip, t). It follows that 

g{y,z,t) = g{z,t) + {y - l)gi{z,t) (19) 



In the following we show that, at the lowest order, the 
generating function is such that g{z,t) — 1 oc kmt/N. 
Let us denote by (j)m{p,t\'m' ,p' ,s) the probability distri- 
bution at time t with the following condition (pm {p, t = 
s\m',p',s) = 6m,m'Sp,p' ■ Since the transition rate from 
the mRNA state to the 1 mRNA state can be made ar- 
bitrarily small (km/N), we can assume that the system 
has, at maximum, one transition from the state to 1 (in 
a given time t). Neglecting all events that include more 
than one transition — ?> 1, it follows that ^(p, i|0, 0, 0) 
defined by (f)Q{p,t\0,0,0) + (j)i{p,t\0,0,0) can be written 
has 



^{p,t\0,0,0)^5{p)e 



-tk^/N 



(20) 



+ / d.'^e-^'='"/^0(p,t|l,O,5) 



where exp{—tk„i/N) is the probability that we ob- 
serve no —7- 1 transitions in a time i, while 
exp{—skra/N)km/Nds is the probability of a transition 
between time s and s + ds. The distribution 0(p, i|l, 0, s) 
describes the probability to find p proteins in a pro- 
cess where all transitions — )■ 1 are now neglected, and 
with the condition m = 1 and p = at time t = s. 
The latter distribution 0, and its generating function g, 
are therefore independent of the ratio km/N. It fol- 
lows that the generating function g{z, t) (in our case 
g(z,t)=g(z,i|0,0,0))is 



9{z,t) 



-(k^/N)t 
k 



(21) 



+ ds^-^e-^^-'^>~g{z,t\l,Q,s) 
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which at the first order in km/N leads to 
k '■* 



g{z,t) = l + 
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(22) 



Using the fact that g{z, t\l, 0, s) — g{z,t — s|l, 0, 0) and 
defining the dimensionless variable a = 1 — s/t we obtain 



9iz,t) 



^+'^fda[g{z,at\l,Q,0)-l] (23) 



and thus g{z, t) — 1 ex -^ as claimed in the main text. 
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FIG. 4. (A) The kinetic scheme for a simple model for mRNA 
production and decay. B) Reduced model emerging from the 
PPA mapping. Probability distribution of number of mR- 
NAs in (A) is identical to the probability distribution of the 
number of systems in the ON state in (B). 



B. 2-STAGE MODEL OF GENE EXPRESSION: 
MRNA DISTRIBUTION 

In this section, we show how the Partitioning of Pois- 
son Arrivals (PPA) mapping leads to the distribution of 
mRNA levels for the 2-stage model. In section (A), we 
write down the master equation and define the associated 
generating function G{z,t). The mapping is then intro- 
duced in section (B) by defining the generating function 
g{z,t) of the reduced model. The time dependent solu- 
tion of the reduced process is given in section (C) and 
finally the full generating function G(z, t) is given in sec- 
tion (D). 



A) Master Equation and Generating function 

The master equation for ^M{t), the probability distri- 
bution of mRNAs in the Fig. [4|V, is given by 

dt^Mit) = km[^M-lit) - *M(i)] (24) 

+ fim[{M + l)$A/+l(t) - Af$A,(t)] 

The equation for the generating function G(z,t) = 
Em z'^'<l>M{t) is 

dtG ^ kmiz - 1)G - ^im{z - l)d,G (25) 



N 



The exact solution can be obtained by directly solving 
Eq. |25| However, this problem also provides an ideal 
example to illustrate the PPA mapping approach. 



B) Mapping 

The PPA mapping connects the original model to N 
independent, identical reduced models (Fig [4^3). To ex- 



plicitly derive it from the Master equation, let us write 
the generating function as G = {g)^ ■ Substituting in Eq. 
[25] we see that g and G obey the same equation with the 
rescahng k^ — > k^/N 



dtg= -^i^- 1).9-Mm(z 



1)9.5 



(26) 



For the reduced model, defining (j)m{t) as the probabil- 
ity to have m mRNAs at time t, we can write the gener- 
ating function as g{z,t) = (poit) + z(j)i{t) + z^ (f)2{t) . . . . As 
discussed, for large iV, it is unlikely to find more than one 
mRNA in the reduced model. In the stationary state, we 
have (/.^ ~ 1 - O (1/iV) and (t)*m-0 (l/iV™) for m > 1. 
Keeping the first order term in l/N ^ the dynamics of 
the reduced model is effectively described by the kinetic 
scheme of an ON-OFF model presented in Fig. Hfi. 



C) The reduced model: its time dependent solution 

Let us now consider the initial condition (/)m(i = 0) = 
5m.o so that we have (f>m{t) ~ O (1/7V™) for m > 1 and 
all time t. To first order in 1/A^, the generating function 
of the reduced model is g{z,t) — (j)o{t) + z0i(i), where 
<j>o{t) and <j>i{t) obey the master equation of the 2-state 
model 

dtMt) = -dtMt) = -^Mt) + /im'/'i(i) (27) 
with solution 

01 (t) = 1 - ^oit) = (l - e-t^^+fc"./^)*) 0* (28) 

where <j)\ = (fc,„/iV)/(Aim + km/N). 

D) The full generating function 

The full generating function, is given by G = 
limAr^oo(g)^ = limjv_i.oo exp [N{g — 1)] and leads to 



G{z^ t) — exp 



■(z-l)(l-e-^'"*) 



(29) 



which corresponds to the well know Poisson distribution 
of mRNA, with mean (fc„i//im)(l — e^^™*). 
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FIG. 5. (A) The kinetic scheme for protein production in the 
2-stage model. (B) Reduced model emerging from the PPA 
mapping. 



of the mapping are presented in section (B) by defining 
the generating function g(jj,z,t) of the reduced model. 
The time dependent solution of g{y,z,t) is given in sec- 
tion (C) and finally the full generating function G{y, z, t) 
is obtained in section (D). 



A) Master Equation and Generating function 

Let us now consider the full probability distribution 
of the 2-stage model by writing ^M{P,t) the time- 
dependent probability distribution with the master equa- 
tion: 

dt<pM{P, t) = fc„J$M-l(P, t) - $m(P, t)] (30) 

+ Mm[(M + l)*A/+l(P, t) - M<^m{P, t)] 
+ kpM[<^M{P-l,t)-<5M{P,t)] 

+ fip[{P + i)^m{p + i,t)- P<i>M{P, t)] 

The generating function 



G(2/,z,i) = E2/*'^''**^(^'*) 



M,P 



obeys 



dtG ^ k,n{v - 1)G - /i,„(j/ - l)dyG 
+ kp{z - l)ydyG - Hp{z - l)dzG 



(31) 



(32) 



C. 2-STAGE MODEL OF GENE EXPRESSION: 
PROTEIN DISTRIBUTION 

In this section we show how the PPA mapping allows us 
to obtain the protein distribution and the joint niRNA- 
protein distribution for the 2-Stage model (Fig. IsK). In 
section (A), we write down the master equation and de- 
fine the associated generating function G{y, z,t). Details 



B) Mapping 

Following the steps presented in the previous section, 
we define g{y, z, t) such that G = (5)^. We see that g is 
governed by 

k 
dtg = ^(y " 1).9 - M™(y - l)9y3 (33) 

+ kp{z - l)ydyg - Hp{z - l)dzg 



Again, we see that g corresponds to the generating func- 
tion of the 2-stage model under the rescahng km — > 
kjn/N. For large iV values, the production of two or more 
niRNA in the reduced model is unlikely and can be ne- 
glected. In the limit N ^ oo the generating function can 
be written as g{y,z,t) = Y.p^^[^o{p,t) -h#i(p,t)]. Its 
dynamics is effectively described by the kinetic scheme 
presented in Fig. [5j3. Starting with the initial condition 
4im{p,t = 0) = 6m,oSp,o: wc have <l)„^{p,t) ~ 1/A^™ for 
m > 1 and Vt. 



C) The reduced model: its time dependent solution 

Let us write 5 in the form (7(7/, z, i) = go{z, t)+ygi{z, t), 
where gQ{z,t) and gi{z^t) are the generating functions 
defined by gm{z,t) = J2p^^(f'rnip,t) (m = 0,1). The 
latter quantities obey the coupled equations 

k 
dtga = -Mp(^ - 1)52.90 - ^.9o + Mm5i (34) 



dtgi = -^ip{z - l)dy_gi + kp(z - l)gi 

- Mm5i + 7;r^° 



(35) 



Summing these two equations and writing g{z,t) 
g{l,z,t), we get 



51(2,*) = 



1 

kp{z - 1) 



dtg{z,t) + ^d.g{z,t) (36) 



t 



which allows us to write g{y^ z,t) as 



'l{y,z,t)=g{z,t) + iy-l)^d,giz,t)+ /^ ^], dtg{z,t) 



kp{z - 1) 



(37) 



Let us first consider the result for protein distributions 
in the stationary state. Based on previous work ( |13H15| ). 
we obtain the stationary solution of the reduced model 



g [z,t) = iFi ( ; ;— (2- 1) 



(38) 



where iFi is the confluent hypergeometric function. Fur- 
thermore, the time-dependent solution for the protein 
distribution in the reduced model has been obtained in 
previous work ([15j) 



;(z,t) = ^,(t)iFi('^^^;^;^(z-l) 



f-^p l^p f^p 



(39) 



1^7] 



^„s(t)iFi(l-^;2 ^'"- ^P 



Mp 



^p ^p 



(z-l) 



with 
Fs{t) = iFi 



km/N ^ ^ ^ 



MP 



flp flp 



^e-'^'"*(z-l) 



(40) 



NflmifJ-p -Mm) 

X iFi f^; 1 + ^; -^e-'^'"*(z - 1)) 
\ A^p A'p A'p / 

D) The full generating function 

From G — (g)^ , it is readily shown that the original 
generating function is given by 



with 



G{y,z,t)= lim e^^[9("'*)l 



:F[g{z,t)] = g{z,t) + {y ~ l)f^dM^,t) 

Kp 



(42) 



y~i 



-dtg{z,t)-l 



(43) 



kp{z - 1) 
and in the steady-state 

G*{y,z)= (44) 

lim^.^^exp{iv[ifi(^;^;^(z-l))-i; 

In the following, we show that the steady-state dis- 
tribution derived above is equivalent to the exact re- 
sult derived in recent work ([22J). By the definition of 
the hypergeometric functions we have ■j^iFi{a;l3;jx) — 
P iFiia + !;/? + l;jx) or iFi(a;/3;7x) = 1 + 

§7/0 iFi{a + l;f3 + l;7s) ds. Using this relation in 
the preceding equation for G*{z){= G*{1, z)), we obtain: 



exp 



G*(z)= (45) 

(k^^j^ r p (i 1 + ^. ^(s-i)) ds] 



which is exactly the result derived in previous work ( 22 ) . 



D. MODEL WITH DELAYED DEGRADATION 

We consider an extension of the 2-stage model in which 
the proteins degrade in two steps. First proteins are 
tagged (with rate 7) and after being tagged they are 
degraded with a fixed time delay of r (Fig. [sK). The 
corresponding reduced model, obtained using the PPA 
mapping approach, is shown in Fig. [3|3. 

To obtain the exact solution for the steady-state pro- 
tein distribution, we catergorize the proteins at a given 
time t (with t large enough such that the system is in 
steady-state) into two groups: tagged and untagged pro- 
teins. Then, at time t -I- r, all the tagged proteins will 



have degraded and the untagged proteins will survive. 
During the time-interval r, mRNAs give rise to new pro- 
teins that are added to the system. These new proteins 
will also surive upto time t+r. Thus, the random variable 
corresponding to the number of proteins in the system at 
time i -|- r is the sum of two indepedent random vari- 
ables: the number of untagged proteins at time t and the 
number of proteins created in the time interval [t,t + r]. 
Let us denote the corresponding generating functions as 
follows: total proteins {Q{z)), proteins untagged at time 
t (U{z)) and proteins created in the time interval [t, t + r] 
(W(z)). Since the total number of proteins is the sum 
of the other two independent random variables, we have 
Q{z) = U{z)Wiz) 



The distribution of untagged proteins at time t is the 
same as the steady-state distribution of proteins in the 
basic two-stage model (with degradation rate in the basic 
two-stage model set equal to the tagging rate 7). The 
corresponding generating function has been obtained in 
previous work jl3) ) and is given by 



U{z) 



lim 1 Fi 

Af-i-oo 



7V7' 7 



7 



(46) 



Now, we consider the proteins created in the time inter- 
val r. For the reduced model, let Wo{z) and VFi(z) be 
the generating functions for the protein distribution cor- 
responding to the system being in OFF and ON states 
respectively. The following master equations govern the 
evolution of Wq{z) and Wi{z) : 



dWo 
dWi 



h 

'~N 

h 

'1^ 



Wq + firaWi 



(47) 



Wo + ^lmWl + kp{z - l)Wi (48) 



therefore: 



Wi = 
Wo = 



1 



dW 



kp{z — 1) dt 
-1 dW 
kp{z — 1) dt 



W 



(49) 
(50) 



where W{z) = Wo{z) + Wi{z). Correspondingly, we ob- 
tain the following equation for W{z): 



d^w k„ dw 



N 



kp{z^l)W ^0 

(51) 

The solution of this ordinary differential equation is given 
by[13j: 



Wiz,t) 



(j^ ^{a(z)-li{z))t ^ (j_^ ^(a(z)+fi{z))t 



(52) 



where a{z) and /3(z) are: 
2a{z) = kp(z - 1) - /i„i - 



N 



{2p{z)f = kl{z If + 2(^ - M„)fc,(z - 1) + (^,„ + '^f 



(53) 

(54) 



To obtain Ci and C2 we use the initial conditions. Since 
we are in the steady-state limit, the initial conditions are: 



Wo(^,0) 



N 



iJ-m 



N^ir 



W,iz,0) 



Using the above, we get: 

iPjz) + a{z)) - kpjz ^ l)WiiO) 

2/3(z) 
(J3{z) - a{z)) + kpjz - l)Wi{0) 



C2 



2f3{z) 
For N ^ 00 and t = t 

i kjnkp yz ij 



(55) 



(56) 

(57) 



W{z,t)^1 + 



N M^ i-^(z-i) 



fJ-rnT 



(z-l) 



1- -^(z-1) 



1-e 



-^'"(1-^(^-1))- 



(58) 
The generating function of the original model is G{z) — 

G{z) = (59) 



exp 



X limAT^oo exp iiV LFi 
where s{z) — fi„i — kp{z — 1) 



k„JN , i_L,-n . kp. 



-, • -,.T<^-i: 
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